Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CDX sorting: Default list sort is more efficient #609

Merged
merged 1 commit into from
Jan 26, 2021
Merged

Conversation

ikreymer
Copy link
Member

Description

As mentioned in #608 and explained in: https://stackoverflow.com/a/53023435
it turns out the CDX sorting that pywb uses is less efficient, than the default list sort.

Motivation and Context

In fact, it appears to be that the current sorting ends up being O(n^2) instead of O(n * log n), which would cause significant perf issues for larger indexes.

This switches to just aggregating all cdx, and sorting at the end.

Sometimes the simpler solution is the correct one!

@ikreymer ikreymer merged commit 4683d95 into master Jan 26, 2021
@ikreymer ikreymer deleted the simplify-sort branch January 27, 2021 03:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant